{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "------------------\n",
    " 第三章：结构化数据 (2022/11/19)\n",
    "-----------------\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 字典"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 如何发现代码中的字典"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'name': 'Ford Prefect',\n",
       " 'Gender': 'Male',\n",
       " 'Occupation': 'Researcher',\n",
       " 'Home Planet': 'Betelgeuse Seven'}"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "{'name':'Ford Prefect','Gender':'Male','Occupation':'Researcher','Home Planet':'Betelgeuse Seven'}   #大括号、值是字符串对象，值/键之间用逗号隔开，冒号对应值/键"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'name': 'Ford Prefect',\n",
       " 'Gender': 'Male',\n",
       " 'Occupation': 'Researcher',\n",
       " 'Home Planet': 'Betelgeuse Seven'}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "person3 = {'name':'Ford Prefect','Gender':'Male','Occupation':'Researcher','Home Planet':'Betelgeuse Seven'}\n",
    "person3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 字典不会维持插入顺序（无序性）\n",
    ">> 要使用键来查找值（列表使用索引值查找数据）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Betelgeuse Seven'"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "person3['Home Planet']  # 使用键查找数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 运行时处理数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 1、向字典增加一行数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "person3\n",
    "person3['Age'] = 33   #  向字典增加一行数据，将一个对象赋至一个新键"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'name': 'Ford Prefect',\n",
       " 'Gender': 'Male',\n",
       " 'Occupation': 'Researcher',\n",
       " 'Home Planet': 'Betelgeuse Seven',\n",
       " 'Age': 33}"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "person3"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---------------------------"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "输入单词mihaoj\n",
      "i\n",
      "a\n",
      "o\n"
     ]
    }
   ],
   "source": [
    "vowels = ['a','e','i','o','u']  \n",
    "word =input(\"输入单词\")    \n",
    "found = []\n",
    "for letter in word:\n",
    "    if letter in vowels:\n",
    "        if letter not in found:\n",
    "            found.append(letter)\n",
    "for vowel in found:\n",
    "    print(vowel)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 进行迭代：报告元音出现的频度 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "found = {}    #初始为空的字典"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'a': 0, 'e': 0, 'i': 0, 'o': 0, 'u': 0}"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "found['a'] = 0\n",
    "found['e'] = 0\n",
    "found['i'] = 0\n",
    "found['o'] = 0\n",
    "found['u'] = 0   \n",
    "found         # 将所有元音计数初始值为0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "found['e'] = found['e'] + 1  # 递增e的计数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "found['e'] += 1    # 递增e的计数\n",
    "# Python支持+=操作符，形式更简洁。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'a': 0, 'e': 1, 'i': 0, 'o': 0, 'u': 0}"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "found"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 迭代处理字典"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a\n",
      "e\n",
      "i\n",
      "o\n",
      "u\n"
     ]
    }
   ],
   "source": [
    "# 用for循环迭代一个字典时，解释器只处理字典的键\n",
    "for kv in found:\n",
    "    print(kv)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a was found 0 time(s).\n",
      "e was found 1 time(s).\n",
      "i was found 0 time(s).\n",
      "o was found 0 time(s).\n",
      "u was found 0 time(s).\n"
     ]
    }
   ],
   "source": [
    "for k in found:\n",
    "    print(k,'was found',found[k],'time(s).')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a was found 0 time(s).\n",
      "e was found 1 time(s).\n",
      "i was found 0 time(s).\n",
      "o was found 0 time(s).\n",
      "u was found 0 time(s).\n"
     ]
    }
   ],
   "source": [
    "# 利用内置函数sorted可以顺序生成输出  利用键来确定排序（A-Z）\n",
    "for k in sorted(found):\n",
    "    print(k,'was found',found[k],'time(s).')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a was found 0 time(s).\n",
      "e was found 1 time(s).\n",
      "i was found 0 time(s).\n",
      "o was found 0 time(s).\n",
      "u was found 0 time(s).\n"
     ]
    }
   ],
   "source": [
    "#  用items迭代处理字典\n",
    "for k,v in sorted(found.items()):    # 在found字典中调用items函数，传回看k，v两个循环变量\n",
    "    print(k,'was found',v,'time(s).')   # 输出一样，但是这个代码更容易读"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "输入单词input\n",
      "a was found 0 time(s).\n",
      "e was found 0 time(s).\n",
      "i was found 1 time(s).\n",
      "o was found 0 time(s).\n",
      "u was found 1 time(s).\n"
     ]
    }
   ],
   "source": [
    "#  迭代频数计数\n",
    "vowels = ['a','e','i','o','u']  \n",
    "word =input(\"输入单词\")  \n",
    "found = {} \n",
    "found['a'] = 0\n",
    "found['e'] = 0\n",
    "found['i'] = 0\n",
    "found['o'] = 0\n",
    "found['u'] = 0\n",
    "for letter in word:\n",
    "    if letter in vowels:\n",
    "        found[letter] += 1\n",
    "for k,v in sorted(found.items()):\n",
    "    print(k,'was found',v,'time(s).')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "输入单词input\n"
     ]
    },
    {
     "ename": "KeyError",
     "evalue": "'i'",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mKeyError\u001b[0m                                  Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-1-91e1f73d080b>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[0;32m      4\u001b[0m \u001b[1;32mfor\u001b[0m \u001b[0mletter\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mword\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      5\u001b[0m     \u001b[1;32mif\u001b[0m \u001b[0mletter\u001b[0m \u001b[1;32min\u001b[0m \u001b[0mvowels\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 6\u001b[1;33m         \u001b[0mfound\u001b[0m\u001b[1;33m[\u001b[0m\u001b[0mletter\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m+=\u001b[0m \u001b[1;36m1\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m      7\u001b[0m \u001b[1;32mfor\u001b[0m \u001b[0mk\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mv\u001b[0m \u001b[1;32min\u001b[0m \u001b[0msorted\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mfound\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mitems\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      8\u001b[0m     \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mk\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;34m'was found'\u001b[0m\u001b[1;33m,\u001b[0m\u001b[0mv\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;34m'time(s).'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;31mKeyError\u001b[0m: 'i'"
     ]
    }
   ],
   "source": [
    "# 进行迭代，使得值为0的键不打印\n",
    "vowels = ['a','e','i','o','u']  \n",
    "word =input(\"输入单词\")  \n",
    "found = {}                       # 方法不可行\n",
    "for letter in word:\n",
    "    if letter in vowels:\n",
    "        found[letter] += 1\n",
    "for k,v in sorted(found.items()):\n",
    "    print(k,'was found',v,'time(s).')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 用in检查成员关系"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'apple': 10}"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fruits = {}\n",
    "fruits['apple'] = 10\n",
    "fruits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'apple': 10, 'bananas': 1}"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "if 'bananas' in fruits:\n",
    "    fruits['bananas'] += 1       # 查看bananas键是否在字典中，如果不在为这个键的值初始化为1\n",
    "else:\n",
    "    fruits['bananas'] = 1\n",
    "fruits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'apple': 10, 'bananas': 2}"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "if 'bananas' in fruits:\n",
    "    fruits['bananas'] += 1\n",
    "else:                       # 再次执行，将值增1\n",
    "    fruits['bananas'] = 1\n",
    "fruits "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* 用not in代替in"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'apple': 10, 'bananas': 2, 'pears': 1}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "if 'pears' not in fruits:\n",
    "    fruits['pears'] = 0   \n",
    "fruits['pears'] += 1   #递增\n",
    "fruits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'apple': 10, 'bananas': 2, 'pears': 2}"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fruits.setdefault('pears',0)         # Python提供了setdefault方法代替not in （不会容易出错）\n",
    "fruits['pears'] += 1   #递增\n",
    "fruits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "输入单词input\n",
      "i was found 1 time(s).\n",
      "u was found 1 time(s).\n"
     ]
    }
   ],
   "source": [
    "# 进行迭代，使得值为0的键不打印\n",
    "vowels = ['a','e','i','o','u']  \n",
    "word =input(\"输入单词\")  \n",
    "found = {}\n",
    "for letter in word:\n",
    "    if letter in vowels:\n",
    "        found.setdefault(letter,0)\n",
    "        found[letter]  += 1\n",
    "for k,v in sorted(found.items()):\n",
    "    print(k,'was found',v,'time(s).')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">>1. dict.setdefault()如果键在字典中，返回这个键所对应的值。如果键不在字典中，向字典中插入这个键\n",
    ">>2. sorted内置函数可以按照顺序（a-z）输入值键 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">1. 字典是无序的，不会维持插入的顺序，需要排序可以使用sorted内置函数\n",
    ">2. items方法允许按行迭代处理一个字典，一次迭代中items方法会向for循环返回下一个键和它的关联值\n",
    ">3. 如果试图访问一个字典不存在的键，会发生一个keyerror\n",
    ">4. 访问一个键之前，可以通过确保每个键都有一个关联值来避免错误，in和not in 可以提供帮助，但是成熟的技术是用setdefault方法"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 集合"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">1. 集合不允许有重复\n",
    ">2. 发现代码中的集合\n",
    ">>1. 大括号\n",
    ">>2. 对象之间用逗号隔开\n",
    ">>3. 没有重复值、不会维持插入的顺序（sorted函数）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'a', 'e', 'i', 'o', 'u'}"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vowels = {'a','e','i','o','u'}\n",
    "vowels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'a', 'e', 'i', 'o', 'u'}"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "vowels = set('aeeiouu')    # 用set函数传递任意序列，快速生成一个集合\n",
    "vowels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'hello'"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "word = 'hello'\n",
    "word"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## union合并集合"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "u = vowels.union(set(word))    # union并集，将一个集合与另一个集合合并，再把合并结果赋给一个新变量 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'a', 'e', 'h', 'i', 'l', 'o', 'u'}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "u"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['a', 'e', 'h', 'i', 'l', 'o', 'u']"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "u_list = sorted(list(u))   # 将集合转化为有序列表\n",
    "u_list"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## difference找出不是共有的元素"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'a', 'i', 'u'}"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d = vowels.difference(set(word))   \n",
    "d        #用difference函数将vowels中的对象与set(word)中的对象进行比较，返回新的集合d，包含在vowels集合中但不在set(word)中"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## intersection报告共同对象"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'e', 'o'}"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "i = vowels.intersection(set(word))\n",
    "i"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">1. 集合不允许有重复\n",
    ">2. 可以利用set快速创建一个元素集合\n",
    ">3. sorted函数可以排序\n",
    ">4. 并集（union）、差集（difference）、交集（intersection）\n",
    ">>1. u = vowels.union(set(word))\n",
    ">>2. d = vowels.difference(set(word))\n",
    ">>3. i = vowels.intersection(set(word))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "输入单词input\n",
      "u\n",
      "i\n"
     ]
    }
   ],
   "source": [
    "vowels = set('aeiou')\n",
    "word = input('输入单词')\n",
    "found = vowels.intersection(set(word))\n",
    "for vowel in found:\n",
    "    print(vowel)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
